Multi-Label Approaches to Web Genre Identification

نویسندگان

  • Vedrana Vidulin
  • Mitja Lustrek
  • Matjaz Gams
چکیده

A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label machine learning problem into several sub-problems of learning binary single-label classifiers, one for each genre. In this paper we explore multi-class transformation, where each combination of genres is labeled with a single distinct label. This approach is then compared to the binary approach to determine which one better captures the multi-label aspect of web genres. Experimental results show that both of the approaches failed to properly address multi-genre web pages. Obtained differences were a result of the variations in the recognition of one-genre web pages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Combination based on OWA Operators for Multi-label Genre Classification of web pages Una combinación basada en operadores OWA para la Clasificación de Género Multi-etiqueta de páginas web

This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in which a web page can be affected by mor...

متن کامل

A Combination based on OWA Operators for Multi-label Genre Classification of web pages

This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in which a web page can be affected by mor...

متن کامل

Single and Multi Column Neural Networks for Content-based Music Genre Recognition

This working note reports approaches of team KART to MediaEval2017 AcousticBrainz Genre Task and their results. To solve the problem, we mainly considered the sparsity and noise of data, network design for the multi-label classification, and implementation of successful Deep Neural Network (DNN) models. We propose three steps of preprocessing and depict two different approaches: a single-column...

متن کامل

Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems

We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished referen...

متن کامل

Web-Mediated Genres – A Challenge to Traditional Genre Theory

This paper explores the possibility of extending the functional genre model to account for non-linear, multi-modal, web-mediated documents. It adds a two-dimensional perspective to the genre analysis model in order to account for the fact that web documents not only act as text but also as medium. A substantial part of the paper is devoted to a discussion of the function of links; mainly becaus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JLCL

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2009